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Abstract. - Analyzing football score data with statistical techniques, we investigate how the 
highly co-operative nature of the game is reflected in averaged properties such as the distributions 
of scored goals for the home and away teams. It turns out that in particular the tails of the 
distributions are not well described by independent Bernoulli trials, but rather well modeled by 
negative binomial or generalized extreme value distributions. To understand this behavior from 
first principles, we suggest to modify the Bernoulli random process to include a simple component 
of self-affirmation which seems to describe the data surprisingly well and allows to interpret the 
observed deviation from Gaussian statistics. The phenomenological distributions used before can 
be understood as special cases within this framework. We analyzed historical football score data 
from many leagues in Europe as well as from international tournaments and found the proposed 
models to be applicable rather universally. In particular, here we compare men's and women's 
leagues and the separate German leagues during the cold war times and find some remarkable 
differences. 



Football (soccer) is of the most popular sports world- 
wide, attracting millions of spectators each year. Its pop- 
ularity and economical importance also captivated scien- 
tists from many fields, for instance in the attempt to im- 
prove the game tactics etc. Much less effort has been de- 
voted, it seems, to the understanding of football (and other 
ball sports) from the perspective of the stochastic behavior 
of co-operative "agents" {i.e., players) in abstract models. 
Such problems recently have come into the focus of physi- 
cists in the hope that the model-based point-of-view and 
methodological machinery of statistical mechanics might 
add a new perspective to the much more detailed investi- 
gations of more specific disciplines [1,2]. Some reports of 
such research are collected in Ref. [3]. Score distributions 
of ball games have been occasionally considered by statis- 
ticians [4-7]. Very small data sets were initially found 
to be reasonably well described by the simplest Poisso- 
nian model resulting from constant and independent scor- 
ing probabilities [4]. Including more data, however, bet- 
ter phenomenological fits were achieved with models such 
as the negative binomial distribution (NBD), which can 
be constructed from a mixture of independent Poissonian 
processes [6] , or even with models of generalized extreme 



value (GEV) statistics [7,8], which are particularly suited 
for heavy-tailed distributions. This yielded a rather inho- 
mogcneous and purely phenomenological picture, without 
offering any microscopical justification. We argue that the 
crucial ingredient missed in previous studies are the cor- 
relations between subsequent scoring events. 

In a broader context, this problem of extremes is of ob- 
vious importance, for instance, in actuarial mathematics 
and engineering, but the corresponding distributions with 
fat tails also occur in many physics fields, ranging from 
the statistical mechanics of regular and disordered sys- 
tems [9-12] over turbulence [13] to earth quake data [14]. 
In these cases often average properties were considered 
instead of explicit extremes, and the empirical occurrence 
of heavy-tailed distributions led to speculations about hid- 
den extremal processes, most of which could not be iden- 
tified, though. It was only realized recently that GEV 
distributions can also arise naturally as the statistics of 
sums of correlated random variables [15], which could ex- 
plain their ubiquity in nature. 

For the specific example of scoring in football, corre- 
lations naturally occur through processes of feedback of 
scoring on both teams, and we shall sec how the intro- 
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duction of simple rules for the adaptation of the success 
probabilities in a modified Bernoulli process upon scor- 
ing a goal leads to systematic deviations from Gaussian 
statistics. We find simple models with a single parame- 
ter of self-affirmation to best describe the available data, 
including cases with relatively poor fits of the NBD. The 
latter is shown to result from one of these models in a par- 
ticular limit, explaining the relatively good fits observed 
before. 

To investigate the importance of correlations, we con- 
sider the distributions of goals scored by the home and 
away teams in football league or cup matches. To the 
simplest possible approximation, both teams have inde- 
pendent and constant (small) probabilities of scoring dur- 
ing each appropriate time interval of the match, such that 
the resulting final scores n follow a Poisson distribution, 

Px{n) = ^e-\ (1) 

where A = {n). Here and in the following, separate pa- 
rameters are chosen for the scores of the home and away 
teams. Clearly, this is a gross over-simplification of the sit- 
uation. Averaging over the matches during one or several 
seasons, one might expect a distribution of scoring prob- 
abilities A depending on the different skills of the teams, 
the lineup for the match etc., leading to the notion of a 
compound Poisson distribution. For the special case of A 
following a gamma distribution /(A), the resulting com- 
pound distribution is a NBD [16], 

f°° Tir + n) 

PrAn) = / dXP^{n)f{X) = \ ; / p"(l-pr- (2) 
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The NBD form has been found to describe football score 
data rather well [6,7]. It appears rather ad hoc, however, 
to assume that /(A) follows a gamma form, and fitting 
different seasons of our data with the Poissonian model 
(1), the resulting distribution of A does not resemble a 
gamma distribution. As a phenomenological alternative 
to the NBD, Greenhough et al. [7] considered fits of the 
GEV distributions 

+ (3) 

to the data, obtaining clearly better fits than with the 
NBD in some cases. Depending on the value of the param- 
eter ^, these distributions are known as WeibuU < 0), 
Gumbel 0) and Frechet > 0) distributions, respec- 
tively [8]. 

In the present context of scoring in football, goals arc 
likely not independent events but, instead, scoring cer- 
tainly has a profound feedback on the motivation and 
possibility of subsequent scoring of both teams (via direct 
motivation/demotivation of the players, but also, e.g., by 
a strengthening of defensive play in case of a lead). Such 
feedback can be taken into account starting from a sim- 
ple Bernoulli model: consider a match divided into, e.g.. 



= 90 time steps with both teams having the possibility 
to score in each unit with a probability p = p{n) depending 
on the number n of goals scored so far. Several possibili- 
ties arise. For our model "A" , upon each goal the scoring 
probability is modified as p{n) ~ p{n — 1) -f k, with some 
fixed constant k. Alternatively, one might consider a mul- 
tiplicative modification rule, p{n) — Kp{n — 1), which we 
refer to as model "B". Finally, in our model "C" the as- 
sumption of independence of the scoring of the two teams 
is relaxed by coupling the adaptation rules, namely by set- 
ting phin) = phin - l)Kh, Pa{n) = pa{n - 1)/Ka upon a 
goal of the home (h) team, and vice versa for an away (a) 
goal. If both teams have k > 1, this results in an incentive 
for the scoring team and a demotivation for the opponent, 
but a value k < 1 is conceivable as well. The resulting, 
distinctly non-Gaussian distributions Pf4{n) for the total 
number of goals scored by one team can be computed ex- 
actly for models "A" and "B" from a Pascal recurrence 
relation [17], 

PN{n) = [1 -p{n)]PN-i{n) + p{n - l)PN-i{n - 1), (4) 

where p{n) — pq + nn (model "A" ) or p{n) = po'*" (model 
"B"). Model "C" can be treated similarly [17]. 

It is remarkable that this rather simple class of feedback 
models leads to a microscopic interpretation of the NBD 
in (2) which, in fact, can be shown to be the continuum 
limit of P/v(ri,) for model "A", i.e., N ^ oo withpo-^ and 
kN kept fixed [17]. For the NBD parameters one finds 
that r = po/k and p = 1 — c^''^, such that a good fit of 
a NBD to the data can be understood from the effect of 
self-affirmation of the teams or players, the major ingre- 
dient of our microscopic models "A", "B", and "C". Ad- 
ditionally, a certain type of continuous microscopic model 
with feedback can be shown to result in a GEV distribu- 
tion [15,17], such that all different types of deviations from 
the Gaussian form occurring here can be understood from 
the correlations introduced by feedback. 

We now confront these models with empirical data sets, 
starting with football matches played in German leagues, 
namely the "Bundesliga" (men's premier league (West) 
Germany, 1963/1964 - 2004/2005, « 12 800 matches), 
the "Oberliga" (men's premier league East Germany, 
1949/1950 - 1990/1991, « 7700 matches), and the 
"Frauen-Bundesliga" (women's premier league Germany, 
1997/1998 - 2004/2005, « 1050 matches) [18]. We deter- 
mined histograms estimating the probability density func- 
tions (PDFs) P^{nh) and P^{na) of the final scores of 
the home and away teams, respectively [19]. Error es- 
timates on the histogram bins were computed with the 
bootstrap resampling method. This allows the judgment 
of the quality of the various fits collected in Table 1 by 
monitoring their goodness or per degree-of-freedom, 
= x^/d.o.f., naturally taking into account the different 
numbers of free parameters in the fits considered. 

We first considered fits of the PDFs of the phenomeno- 
logical descriptions (1), (2), and (3). Not to our surprise. 
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Table 1: Fits and their per degree-of-freedom, = X'^/d-o.f., of the phenomenological distributions (1), (2), and (3) as well 
as fits of our microscopic feedback models "A" and "B" to the data for the East German "Oberliga" , the (West) German men's 
premier league "Bundesliga" , the German women's premier league "Frauen-Bundesliga" and the qualification stages of all past 
"FIFA World Cups". 
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and in accordance with previous findings [6,7], the simple 
Poissonian ansatz (1) is not found to be an adequate de- 
scription for any of the data sets. Deviations occur here 
mainly in the tails with large numbers of goals which in 
general are found to be fatter than can be accommodated 
by a Poissonian model. On the contrary, the NBD form (2) 
models all of the above data well as is illustrated in Fig. 1. 
Considering the fits of the GEV distributions (3), we find 
that extreme value statistics are in general also a reason- 
ably good description of the data. The shape parameter 
^ is always found to be small in modulus and negative in 
the majority of the cases, indicating a distribution of the 
WeibuU type (which is in agreement with the findings of 
Ref. [7] for different leagues). Fixing ^ = yields over- 
all clearly larger x^ values. Comparing "Oberliga" and 
"Bundesliga" , we consistently find larger values of the pa- 
rameter ^ for the former, indicative of the comparatively 
fatter tails of these data, see Table 1 and Fig. 1. Com- 
paring to the results for the NBD, we do not find any 
cases where the GEV distributions would provide the best 
fit to the data, so clearly the leagues considered here are 
not of the type for which Greenhough et al. [7] found bet- 
ter matches with the GEV statistics than for the NBD. 
Similar conclusions hold true for the comparison of "Bun- 
desliga" and "Frauen-Bundesliga", with the latter taking 
on the role of the "Oberliga" . 

Representing the continuum limit of our model "A" , 
the good performance of the NBD fits observed so far im- 



plies that the feedback models proposed here can indeed 
capture the main characteristics of the game. To test this 
conjecture directly we performed fits of the exact distribu- 
tions resulting from the recurrence relation (4), employing 
the simplex method to minimize the total x^ deviation for 
the home and away scores. Comparing the results of model 
"A" to the fits of the limiting NBD, we observe in Table 1 
almost identical fit qualities for the final scores. However, 
for sums and differences of scores we find a considerably 
better description by using our model "A" , indicating de- 
viations from the continuum limit there [17]. The overall 
best modeling of the league data is achieved with fits of 
model "B" which feature on average an even higher quality 
than those of model "A", cf. Table 1. We also performed 
fits to the more elaborate model "C", but found the re- 
sults rather similar to those of the simpler model "B" and 
hence do not discuss them here. 

Comparing the leagues, we see in Table 1 that the 
parameters k for the "Oberliga" are significantly larger 
than for the "Bundesliga" , whereas the parameters po a-re 
slightly smaller for the "Oberliga" . That is to say, scoring 
a goal in a match of the East German "Oberliga" was a 
more encouraging event than in the (West) German "Bun- 
desliga". Alternatively, this observation might be inter- 
preted as a stronger tendency of the perhaps more pro- 
fessionalized teams of the (West) German premier league 
to switch to a strongly defensive mode of play in case 
of a lead. Consequently, the tails of the distributions 
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Fig. 1: Histograms of final scores of fiome and away teams and 
corresponding fits, (a) East German "Oberliga". (b) (West) 
German "Bundesliga" . (c) The qualification stage of the "FIFA 
World Cup" series. 



are slightly fatter for the "Oberliga" than for the "Bun- 
desliga". Recalling that the NED form (2) is the contin- 
uum limit of the feedback model "A", these differences 
should translate into larger values of r and smaller val- 
ues of p for the "Bundesliga" results, which is what we 
indeed observe. Conversely, computing from the NBD 
parameters r and p the feedback parameters po and n 
also given in Table 1, we obtain good agreement with 
the directly fitted values. Comparing the results for the 
"Frauen-Bundesliga" to those for the "Bundesliga", even 
more pronounced tails are found for the former, resulting 
in very significantly larger values of the self-affirmation 
parameter k. 

Finally, we also considered the score data of the qualifi- 
cation stage of the "FIFA World Cup" series from 1930 to 
2002 (w 3400 matches) [20,21]. Compared to the domes- 



tic league data discussed above, the results of the World 
Cup show distinctly heavier tails, cf. Fig. 1. Consequently 
we obtain good fits for the heavy-tailed distributions, and, 
in particular, in this case the GEV distribution provides 
a better fit than the NBD, similar to what was found by 
Greenhough et al. [7], cf. Table 1. The fits of model "A" 
are again rather similar to the NBD. The multiplicative 
feedback model "B" , on the other hand, also handles this 
case extremely well and, for the away team, considerably 
better than the GEV distribution (3). The difference to 
the league data can be attributed to the possibly very 
large differences in skill between the opposing teams oc- 
curring since all countries are allowed to participate in the 
qualification round. The parameters in Table 1 reveal a 
remarkable similarity with the parameters of the "Frauen- 
Bundesliga" , where a similar explanation appears quite 
plausible since the very good players are concentrated in 
just two or three teams. 

We have shown that football score data can be under- 
stood from a certain class of modified binomial models 
with a built-in effect of self-affirmation of the teams upon 
scoring a goal. The NBD fitting many of the data sets 
can in fact be understood as a limiting distribution of 
our model "A" with an additive update rule of the scoring 
probability. It does not provide very good fits in cases with 
heavier tails, such as the qualification round of the "FIFA 
World Cup" series. The overall best variant is our model 
"B" , where a multiplicative update rule ensures that each 
goal motivates the team even more than the previous one. 
Basically by "interpolating" between the GEV form and 
NBD, it fits both these world-cup data as well as the data 
from the German domestic leagues extremely well, thus 
reconciling the heterogeneous phenomenological findings 
with a plausible and simple microscopic model. In gen- 
eral, we find less professionalized leagues or cups to fea- 
ture stronger scoring feedback, resulting in goal distribu- 
tions with heavier tails. It is obvious that the presented 
models with a single parameter of self-affirmation are a 
bold simplification. It is all the more surprising then, how 
rather well they model the considered score data, yielding 
a new example of how sums of correlated variables lead 
to non-Gaussian distributions with fat tails. For a closer 
understanding of the self-affirmation effect, an analysis 
of time-resolved scoring data would be highly desirable. 
Some data of this type has been analyzed in Ref. [22], 
showing a clear increase of scoring frequency as the match 
progresses, thus supporting the presence of feedback as 
discussed here. 
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